Fixtive Story :

We are looking for the best player available for our team. We need a central position player with main offensive qualities. Defensive qualities are secondary, as we already have one of the best defences in the league.

Netball DataBase Variables Description:

Some details about the data, let’s review the specific columns when you open the dataset:

  1. Athlete = Names of individual athletes in the dataset. These names are in the format “Surname, Initial”

  2. Statistic = These are different netball stats captured for each player.

  3. Team = This column refers to one of the eight teams the athlete is playing for.

  4. Total = Total of the statistic. For example, the total number of attempts from Zone 1, or the total number of goals, or the total number of minutes played.


Analysis of Center position : Scoring and Performance Analysis

For this analysis, we focused on a general analysis of the following positions: Goal defender, Wing defender, Centre, Wing attack, Goal attack.

NB : In order to fill the lack of data concerning the specific positions of the players (moreover, I do not know this sport at all), we assume in this analysis that all the central positions are one unique position category.

Variables Definitions :

It is necessary to express the playing time data in order to make a relevant comparison.

Activité Offencive : cumul of disposals season divided by minutes played.

Activité Defencive : cumul gains (defections + interceps) divided by minutes played.

Postes : Who he is ? Goal Shooter = 1, Goal Keeper = 2, Other Positions = 0, (filter positions).

# Loading packages :
library(flexdashboard)
library(tidyverse)
library(viridis)
library(ggplot2)
library(plotly)
library(readxl)
library(DT)
library(stats)
library(factoextra)

# Import Raw Data :
DataSet <- read.csv("Data/Dataset3_Assessment3.csv")

# Import Characterization Table .xls :
Variable_ID <- read_excel("Data/Variables_Details.xlsx")

# Cleaning of the DataSet :
Clean_Df <- DataSet %>% mutate(Statistic = str_replace(Statistic,"attempt_from_zone1","attempts1"),
         Statistic = str_replace(Statistic,"attempt_from_zone2","attempts2"),
         Statistic = str_replace(Statistic,"goal_from_zone1","goal1"),
         Statistic = str_replace(Statistic,"goal_from_zone2","goal2")) %>% 
  filter(!Total == 0)

# Merge two Dataframe in One fusion :
Df_Details <- merge(Clean_Df, Variable_ID, by.x ="Statistic", by.y ="Variables",all = TRUE)

# Remove Na because some Statistic variable aren't present in DataSet :
Df_Details <- Df_Details %>% drop_na() %>% 
  # Who is a shooter or not ? Goal Shooter =1, Goal Keeper = 2, Other Positions = 0
  mutate(Poste_played = ifelse(Statistic =="goalAttempts" & Total > 0,1,
                             ifelse(Statistic =="blocks" & Total > 0,2,0)))

Are there differences in game profiles within the same overall position ?

# Activity Variables in a Dataframe filtered :
Activity_DF <- Df_Details %>% filter(Statistic == "minutesPlayed"| # Relative
                               Statistic == "disposals"| # Attaque +
                               Statistic == "badPasses"| # Attaque -
                               Statistic == "interceptPassThrown"| # Attaque -
                               Statistic == "deflections"| # Defense +
                               Statistic == "gain"| # Defense +
                               Statistic == "obstructionPenalties"|  
                               Statistic == "intercepts") # Defense +

# Pivot Table to observe the dependence between variable and choose the good one for this analysis :
Activity_DF <- Activity_DF %>% pivot_wider(names_from = Statistic, values_from = Total) %>% 
  group_by(Athlete,Poste_played) %>% 
  replace(is.na(.), 0) %>% 
  summarise(minutesPlayed = max(minutesPlayed),
            disposals = max(disposals),
            Bad_Disposal = sum(badPasses+interceptPassThrown),
            deflections = max(deflections),
            obstructionPenalties = max(obstructionPenalties),
            gain = max(gain),
            intercepts = max(intercepts),
            Gain_corrected = sum(deflections + intercepts),
          
            # e.g, you can have 2 disposal but 4 Bad, that's mean the calculation of disposal don't count bad tentative, so Full_Disposal :
            Full_Disposal = sum(Bad_Disposal+disposals), 
            Efficiency_Att = round((Full_Disposal - Bad_Disposal)/Full_Disposal,2),
            Full_Deff_Actions = sum(obstructionPenalties+Gain_corrected),
            Efficiency_Def = round((Full_Deff_Actions - obstructionPenalties)/Full_Deff_Actions,2)) %>% 
  
  mutate(Def_Act = round(Gain_corrected/minutesPlayed*4*15,1),
  Off_Act = round(disposals/minutesPlayed*4*15,1)) 

# We fixe a thresold of 3match for this analysis to have a more representative sample :
Activity_DF <- Activity_DF %>% filter(minutesPlayed>180)


# Gain and Gain_corrected are different ?
library(ggpubr) # package to add regression label on the chart

ggplot(data = Activity_DF, mapping = aes(x = gain , y = Gain_corrected))+
  geom_point() +
  geom_smooth(method=lm) +
  stat_regline_equation(label.x=15, label.y=150) + # allowed to show regression equation
  stat_cor(method = "pearson", label.x = 15, label.y = 140)+ #R2 idem
  labs(title = "Graph 0 : Correlation between 'Gain' variable and 'Corrected Value'")

# Profil - Comparaison demande poste en activities offensive et def :
library(scales) 
# Fonction Kmeans clustering :
library(stats) 
# because scale can't take df with non numeric variable, we have to create new df :
df <- data.frame(Activity_DF$Def_Act, Activity_DF$Off_Act) %>% scale()
#We need to scale df because ifnot we can't use kmeans fonction:
kn3 <- kmeans(df, centers = 3, nstart = 25) # I chose 3 clusters
df2 <- data.frame( Athlete_Name = Activity_DF$Athlete,
                   Clusters =kn3$cluster,
                   Activity_Def = Activity_DF$Def_Act,
                   Activity_Off = Activity_DF$Off_Act,
                   Efficiency_Def = Activity_DF$Efficiency_Def,
                   Efficiency_Att = Activity_DF$Efficiency_Att)

# Plots :
p1 <- ggplot(data = df2, mapping = aes(x=Activity_DF$Def_Act,y=Activity_DF$Off_Act)) +
      geom_point() +
      geom_label(mapping = aes(label = Activity_DF$Athlete))+
      labs(y = "Offensive Activity (Actions per 60min = 1game)", x = "Defencive Activity (per 60min = 1game)") +
      labs(title = "Graph 1 : Offensive and Defensive Activity ")
ggplotly(p1)
# Special plot with the package factoextra, to draw cluster on the graph:
fviz_cluster(kn3, data = df,palette = "Set2", ggtheme = theme_minimal()) +
  labs(title = "Graph 2 : Athlete Clustering - Offensive and Defensive Activity ",
         subtitle = "Clustering by Kmeans",
         caption = "This Chart highligth 3 categories : Defencive domination, Offensive Domination and Mixt Profil.") # Title design

Interpretations :

NB : the activity data is expressed in number of offensive or defensive actions per match unit. The metrix was expressed in relation to the athletes time played and expressed in count per match (4*15min) in order to facilitate the understanding of the values.

  1. Graph 0 : in view of the differences between the calculated “Gain” variable and the “Corrected Gain” variable which follows the basic evaluation calculation (sum of passes and shots). It seems relevant to consider the “Corrected gain” variable for the further analysis.

  2. Graph 1 : shows the groupings of athletes with dominantly defensive, dominantly offensive or mixed activities.

  3. Graph 2 : classifies the different athletes into three distinct clusters using the Kmeans method. The data appears to be grouped as “mixed activity”, “attack dominated activity”, “defense dominated activity”

NB : the most interesting profile for our recruitment is therefore the profile named “attack dominated activity”!

Player’s Profitability by clusters model activity :

Profitability is the balance between the amount of work the player does and his efficiency. Having a player with a lot of game action but little efficiency is not interesting for us. Similarly, if a player is very efficient but does not do much in the game, then he does not affect the dynamics of the game.

# Creat Top3 filter on the offencive activity variable:
df3 <- df2 %>% group_by(Clusters) %>%  mutate(top3_thresold = ifelse(rank(-Activity_Off)<=3,"TRUE","FALSE"))

library(formattable) # package to design table
formattable(df3, align =c("l","c","c","c","c","c"),
            Athlete_Name = formatter("span", style = ~ style(color = "black",font.weight = "bold")),
            # Colors choose for colour-blind person
             list(Efficiency_Def = color_bar("#00FF00"),
                  Efficiency_Att = color_bar("#03fc77")))
Athlete_Name Clusters Activity_Def Activity_Off Efficiency_Def Efficiency_Att top3_thresold
Aiken, R 2 0.6 56.8 0.40 0.98 FALSE
Altmann, K 2 1.5 28.9 0.89 0.90 FALSE
Anstiss, J 3 2.2 6.0 0.70 0.88 FALSE
Aryang, S 3 5.5 0.4 0.76 0.33 FALSE
Austin, K 2 3.1 49.0 0.75 0.95 FALSE
Bassett, C 2 0.2 48.7 0.11 0.99 FALSE
Bragg, M 3 4.7 2.2 0.83 0.59 FALSE
Brandley, A 3 4.0 2.0 0.52 0.80 FALSE
Browne, K 2 0.8 29.9 0.45 0.94 FALSE
Browne, M 2 2.6 28.0 0.78 0.86 FALSE
Bruce, C 1 10.8 6.0 0.72 0.82 FALSE
Cassidy, M 2 1.5 42.0 0.86 0.95 FALSE
Charles, V 2 1.8 35.2 0.72 0.94 FALSE
Colyer, I 2 1.3 28.6 0.69 0.94 FALSE
Cosh, E 2 1.1 31.9 0.30 0.94 FALSE
Craig, S 3 3.3 1.4 0.49 0.80 FALSE
Dehaney, K 1 7.8 6.6 0.46 1.00 TRUE
Dunkley, L 2 3.0 34.2 0.89 0.96 FALSE
Dwan, T 2 2.0 43.1 0.76 0.95 FALSE
Eddy, K 3 3.6 1.1 0.65 0.62 FALSE
Fowler, J 2 0.7 63.7 0.43 1.00 TRUE
Francis, S 3 3.9 1.1 0.75 0.39 FALSE
Garbin, S 2 2.6 62.0 0.93 0.95 TRUE
Gardner, M 2 1.9 32.4 0.67 0.93 FALSE
Garrett, M 3 3.9 1.1 0.45 0.44 FALSE
Glasgow, S 2 1.6 37.9 0.71 0.96 FALSE
Guscoth, L 3 4.6 0.8 0.52 0.64 FALSE
Hadley, P 2 2.7 23.6 0.66 0.92 FALSE
Harten, J 2 1.9 48.3 0.80 0.96 FALSE
Hay, M 2 2.7 36.0 0.88 0.94 FALSE
Haythornthwaite, N 2 1.2 35.9 0.71 0.94 FALSE
Hinchliffe, T 3 5.8 4.8 0.62 0.95 FALSE
Horjus, G 2 0.7 40.6 0.80 0.93 FALSE
Housby, H 2 3.2 42.4 0.88 0.94 FALSE
Jenner, K 1 7.0 0.7 0.71 0.69 FALSE
Jovic, M 3 3.5 21.6 0.76 0.89 TRUE
Klau, S 1 8.9 6.1 0.67 0.93 FALSE
Koenen, C 2 0.7 47.3 0.67 0.98 FALSE
Kumwenda, M 2 1.0 48.2 0.75 0.98 FALSE
Langman, L 2 2.8 32.6 0.78 0.94 FALSE
Lewis, O 3 4.9 6.0 0.60 0.85 FALSE
Mannix, E 1 8.5 6.7 0.80 0.94 TRUE
Manu’a, K 3 4.0 1.8 0.51 0.69 FALSE
Maweni, P 1 7.8 5.7 0.65 0.99 FALSE
McAuliffe, M 3 4.2 5.4 0.83 0.85 FALSE
Mentor, G 1 9.6 5.8 0.76 0.96 FALSE
Mi Mi, J 2 1.6 29.9 0.85 0.91 FALSE
Moloney, K 2 2.4 25.1 0.76 0.96 FALSE
Moore, L 3 5.5 1.7 0.60 0.88 FALSE
Nankivell, M 3 3.5 24.1 0.62 0.95 TRUE
Nelson, S 2 0.6 44.7 0.67 0.99 FALSE
Parmenter, A 1 7.6 3.1 0.83 0.69 FALSE
Petty, H 2 3.5 27.0 0.67 0.89 FALSE
Philip, T 2 0.5 42.4 0.50 0.95 FALSE
Pitman, C 2 1.3 34.4 0.68 0.90 FALSE
Poolman, S 3 5.7 7.1 0.67 0.99 FALSE
Potgieter, L 2 0.9 41.1 0.52 0.98 FALSE
Pretorius, K 1 7.3 1.0 0.90 0.62 FALSE
Price, J 3 5.8 30.0 0.91 0.94 TRUE
Proscovia, P 2 1.6 39.1 1.00 0.98 FALSE
Proud, M 2 3.1 45.8 0.76 0.94 FALSE
Russell, J 3 4.4 5.5 0.95 0.81 FALSE
Scherian, L 2 0.6 25.5 0.62 0.89 FALSE
Shimmin, K 3 4.6 1.9 0.46 0.50 FALSE
Simpson, G 3 3.9 2.0 0.84 0.62 FALSE
Sinclair, G 2 1.1 43.4 0.74 0.94 FALSE
Sterling, S 1 10.4 6.2 0.70 0.92 TRUE
Teague-Neeld, A 2 2.1 44.3 0.56 0.95 FALSE
Thwaites, C 2 0.7 39.5 0.42 0.98 FALSE
Turner, M 1 8.5 0.4 0.61 0.50 FALSE
Wallace, S 2 1.4 58.6 0.82 0.97 TRUE
Ward, J 1 7.3 0.4 0.65 0.24 FALSE
Watson, L 2 2.0 46.6 0.73 0.96 FALSE
Weston, J 3 4.7 0.9 0.56 0.46 FALSE
Wood, S 2 0.9 48.9 0.73 0.97 FALSE
van der Merwe, S 3 3.9 2.0 0.83 0.57 FALSE
library(ggrepel) # package to manage label in the graph
P2 <- ggplot(data = df3, mapping = aes(x = Efficiency_Att, y = Activity_Off)) +
  geom_point(aes(color = Clusters)) +
  facet_wrap(. ~ Clusters,scales = "free")+
  geom_label_repel(aes(label = ifelse(top3_thresold, as.character(Athlete_Name),"")))+
  labs(title = "Graph 3 : Offensive Analysis - Efficiency vs Acticity ",
         subtitle = "Find the best player per profil, which player is working most efficiently ?",
         caption = "Performance is a balance between work and efficiency !")
P2

Interpretations :

This graph shows the balance between the player’s activity during the season, but also his efficiency. This allows us to determine which player has done the most work efficiently. In this case, we are concerned with evaluating the quality of our players on the offensive activity.

TOP 3: Price, Nankivell and Jovic, are the best performing players in the Kmean category “Mixed activity”,

TOP 3: Garbin, Fowler and Wallace, are the best performing players in the Kmean category “Attacking dominance activity”,

TOP 3: Mannix, Delaney and Sterling, are the best performing players in the Kmean category “Defending dominance activity”.

Finally, for our team, the most relevant choices seem : Garbin, Fowler and Wallace!

And about defense ?

In order to remain one of the best defenses in the league, it would be interesting that our player is a minimum of faculty in the defensive system…

# Creat filtred dataframe with the top players
df4 <- df3 %>% filter(Athlete_Name =="Garbin, S"|
                      Athlete_Name =="Fowler, J"|
                      Athlete_Name =="Wallace, S")

P3 <- ggplot(data = df4, mapping = aes(x = Efficiency_Def, y = Activity_Def)) +
  geom_col(aes(fill = Athlete_Name)) +
  scale_fill_viridis_d()+
  labs(title = "Graph 4 : TOP 3 players in Attack ",
         subtitle = "which one are the best defender ?",
         caption = " WE HAVE TO CALL Garbin,S!")
P3

Interpretations :

This graph shows the profitability of the top 3 players isolated as mentioned above. This allows us to determine which player has done the most work efficiently balance. In this case, we focused on the defensive activity.

📢 WE WANT GARBIN,S !!!! 📢

This player fits our needs best. In addition to his offensive activity of 62 actions per game with an efficiency of 95%. He also knows how to be efficient in a defensive system with 3 actions per game, 93% efficiency 💪🏽.